Topic Flow Model: A Graph Theoretic Temporal Topic Model For Noisy Mediums
نویسندگان
چکیده
In the modern era, data is being created faster than ever. Social media, in particular, churns out hundreds of millions of short documents a day. It would be useful to understand the underlying topics being discussed on popular channels of social media, and how those discussions evolve over time. There exist state of the art topic models that accurately classify texts large and small, but few attempt to follow topics through time, and many are adversely a ected by the large amount of noise in social media documents. We propose Topic Flow Model (TFM), a graph theoretic temporal topic model that identi es topics as they emerge, and tracks them through time as they persist, diminish, and re-emerge. TFM identi es topic words by capturing the changing relationship strength of words over time, and o ers solutions for dealing with ood words, i.e., domain speci c words that pollute topics. We conduct an extensive empirical analysis of TFM on Twitter data, newspaper articles, and synthetic data and nd that the topic accuracy and signal to noise ratio are better than state of the art methods. Index words: Data Mining, Topic Modeling, Social Graphs, Text Mining, Natural Language Processing, Graph Mining, Temporal Topics
منابع مشابه
Traffic Scene Analysis using Hierarchical Sparse Topical Coding
Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...
متن کاملGraph-based Visual Saliency Model using Background Color
Visual saliency is a cognitive psychology concept that makes some stimuli of a scene stand out relative to their neighbors and attract our attention. Computing visual saliency is a topic of recent interest. Here, we propose a graph-based method for saliency detection, which contains three stages: pre-processing, initial saliency detection and final saliency detection. The initial saliency map i...
متن کاملA Wikipedia Based Semantic Graph Model for Topic Tracking in Blogsphere
There are two key issues for information diffusion in blogosphere: (1) blog posts are usually short, noisy and contain multiple themes, (2) information diffusion through blogosphere is primarily driven by the “word-of-mouth” effect, thus making topics evolve very fast. This paper presents a novel topic tracking approach to deal with these issues by modeling a topic as a semantic graph, in which...
متن کاملA Continuous-Time Model of Topic Co-occurrence Trends
Recent work in statistical topic models has investigated richer structures to capture either temporal or inter-topic correlations. This paper introduces a topic model that combines the advantages of two recently proposed models: (1) The Pachinko Allocation model (PAM), which captures arbitrary topic correlations with a directed acyclic graph (DAG), and (2) the Topics over Time model (TOT), whic...
متن کاملیک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017